New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework mpeg4 check, add support for some more mpeg4 and ISO base media formats #206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried the PR with some of my samples, which are audio truncated real world files:
Sample | master | PR #206 |
---|---|---|
issue-120.m4b | mime=video/mp4, extension=mp4 | ❔mime=video/mp4, extension=mp4 |
issue-127.m4b | mime=audio/mp4, extension=m4a | ❌ FAILED |
issue-133.m4a | mime=audio/mp4, extension=m4a | ✔️mime=audio/mp4, extension=m4a |
issue-151.m4a | mime=video/mp4, extension=mp4 | ✔️mime=video/mp4, extension=mp4 |
Samples can be found here
I disagree that it fixes #165, it is not able to detect audio books. See also my comments: 1 & 2
@stroncium Can you attach the original (untruncated) files you used for the fixtures?
Should heave read better:
As for test cases, due to just parsing mpeg4 format and knowing that all the mentioned containers respect it, I don't think extensive testing is needed, and I didn't want to bloat repository with too much audio/video and waste time searching for samples, so I just generated minimal valid header boxes for each format.
That way the tests become a self fulfilling prophecy. Based on the consistency demonstrated so far, I think it is worth to test against a reasonable amount of real world MPEG-4 files.
@Borewit I actually tested it on your samples too. But now it seems like I screwed up one value when gathering all the code together( After fix, outputs for your samples.
All files match with
It is not actually valid mp4 file as per standard, as it uses not only non-registerred, but illegal brand Now, what to do with such files is a big question I forgot to raise. Now, going for matching with extensions you supplied,
I mean, there is no standard which defines what Though it all depends on perspective. I was going by assumption we want to produce extensions files should be saved with. If we want to guess which extensions they were saved with before, then I don't know what to do about it except gathering all the software around capable of producing mp4 and making matches on metadata it outputs. But that won't help in most cases. For example in |
Small mistake, well done.
These files are send in by users, and caused issues in my library. So probably not a fair set of test files neither. I truncated the audio prior adding them to the repository.
I agree.
This sample file is really based on an audio book. What do you mean with It also contains not only audio?
You make it sound like Apple owns the .m4a extension, it can be used for MPEG-4 audio files right? Does not necessary needs to be encoded by Apple right?
Let's capture those common marks, that is good enough in my opinion. I got suspicious on the ftype, because the MPEG-4 i had around did not comply with that at all. Some of them were very complete, and complex file structures, my humble assumption is then, the file has not been put together by world's dumbest idiot.
But if you say this sample is non representative I believe you and in that case it has mislead me big time.
Lets be realistic. No one is expecting magic. I can see that I have frustrated you, please don't be. I will try to find better samples, if someone can borrow me some testing material, that would be fantastic
Let's build the case on those common marks and patterns you seem to be able to recognize. |
@sindresorhus I don't seem to find any good documentation on any of those format. The sources I used are iso media standard, mpeg4 standard, www.ftyps.com, |
The file also contains video stream.
Sure doesn't, my point is was if we try to go with trying to gather some consistent part of it it will be things made by Apple. Otherwise we're doomed, if you google for something like What I've tried to do here was to make a good matcher for all the files respecting standards and unofficial standards but also provide a good fallback in case we don't match,
I can not guarantee they aren't representative as I have no statistics on their usage. I'm just trying to think logically about it.
Only technical difference is audiobook got chapters in it, but if concert wasn't youtube rip and/or was crafted a bit better, it well might have had too. Gathering common traces of known encoders might be a good goal for some use cases, but I'm not sure how reachable it is. Also, introducing that will probably require a change to multiple mime types return format, as for many uses such types will be simply wrong. To illustrate, this I think the first thing should be to match |
As for deeper detection, it's fairly easy to check if there is video stream with significant bitrate to separate video from audio when you have whole file. Separating audiobooks from audio is much more complex problem. Only solution I can think of is searching for keywords in metadata. But I think neither of those 2 features are in scope of |
@sindresorhus So, what should I do about readme?
|
@stroncium I would prefer to keep one entry per file extension, but you can just link to the same wiki article for all of them. |
@sindresorhus Readme done. I've also added description to |
Nice work @stroncium. @sindresorhus & @stroncium: I would like to use |
👍 Agreed, but can be done in a follow-up PR. |
Fixes #165
Fixes #99
WIP: gathering links for readme
Added
3g2
,m4a
,m4b
,m4p
,m4v
,f4a
,f4b
,f4p
,f4v
extensions.Gathered all ISO base media file format detectors together and simplified the code.
Added wildcard check for various mp4 files, but specific enough for false positives to be rare, 4 symbols exact
ftyp
match followed by 4 printable symbols match. Theoretically we might want to add exact matching for the printable symbols, but many of the types aren't registered and last compiled list I found was from 2009 (http://www.ftyps.com/),file
seems to use that same list.If my vision doesn't fail me, that means we support all mp4 versions supported by
file
and more(from wildcard).As for test cases, due to just parsing mpeg4 format and knowing that all the mentioned containers respect it, I don't think extensive testing is needed, and I didn't want to bloat repository with too much audio/video and waste time searching for samples, so I just generated minimal valid header boxes for each format.
For m4v and variants:
-
file
info tells that they havem4v
extenstion andvideo/x-m4v
mime- we had a test case for
mp4
extenstion for them- as per http://www.ftyps.com/ at least
M4V
brand is registered, though mime type is not- they may(and usually do, from what I gather) require additional capabilities compared to mp4
- so for the moment I set it
m4v
andvideo/x-m4v
(which was actually stated in readme, but wasn't detected), but it's probably worth some discussion- tests fixed to support it(fixture-m4v.mp4 -> fiture.m4v)